Regressão Linear e Correlação

Fernando B. Sabino da Silva

Previsões

trees <- read.delim("C:/Users/fsabino/Desktop/Codes/papers/Introductory_Stat_II/notebook/trees.txt")

Gráficos

library(mosaic)
## Warning: package 'mosaic' was built under R version 3.4.3
## Warning: package 'dplyr' was built under R version 3.4.3
## Warning: package 'ggformula' was built under R version 3.4.3
## Warning: package 'ggplot2' was built under R version 3.4.3
## Warning: package 'mosaicData' was built under R version 3.4.3
## Warning: package 'Matrix' was built under R version 3.4.4
splom(trees) # Scatter Plot Matrix

Regressão Linear Simples

gf_point(Volume ~ Girth, data = trees) %>% gf_lm()

Modelo para uma regressão linear

Mínimos Quadrados

Resíduos

Exemplo:

library(plotly) 
## Warning: package 'plotly' was built under R version 3.4.4
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:mosaic':
## 
##     do
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
data(USArrests)

model <- lm(Murder ~ Assault + Rape, data = USArrests)

x_grid <- unique(round(seq(min(USArrests$Assault), max(USArrests$Assault), l = 300), 0))
y_grid <- unique(round(seq(min(USArrests$Rape),    max(USArrests$Rape),    l = 300), 0))
grid   <- expand.grid(x = x_grid, y = y_grid)
z_grid <- coef(model)[1] + coef(model)[2]*grid$x + coef(model)[3]*grid$y

z_grid_aux <- matrix(z_grid, ncol = length(x_grid), byrow = T)

est <- matrix(NA, nrow = max(y_grid), ncol = max(x_grid))
est[y_grid, x_grid] <- z_grid_aux

# Gráfico interativo (na matriz, colunas são os índices do eixo x, e as linhas do eixo y)
USArrests %>%
  plot_ly(x = ~Assault, y = ~Rape, z = ~Murder) %>% 
  add_surface(x = NULL, y = NULL, z = ~est) %>%
  add_markers()

Estimação da variância condicional

Exemplo no R

model <- lm(Volume ~ Girth, data = trees)
summary(model)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

Teste de Hipótese

Exemplo

summary(model)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

Intervalo de Confiança para a Inclinação

confint(model)
##                  2.5 %     97.5 %
## (Intercept) -43.825953 -30.060965
## Girth         4.559914   5.571799

Correlação

cor(trees)
##            Girth    Height    Volume
## Girth  1.0000000 0.5192801 0.9671194
## Height 0.5192801 1.0000000 0.5982497
## Volume 0.9671194 0.5982497 1.0000000
cor(trees[,c("Height", "Girth", "Volume")])

que produzirá o mesmo resultado visto acima.

cor(trees$Height, trees$Volume)
## [1] 0.5982497

R-quadrado: Redução no erro de predição

R-quadrado: Redução no erro de predição

Ilustração Gráfica

\(r^2\): Redução no erro de predição

summary(model)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16